Learning about KnowledgeMiner the Easy Way


KnowledgeMiner is probably unlike any program you have ever worked with before. You do know what to do with, say, a word processor. But what do you do with a data mining program? We will show you just that. If you're a techie, you can find lots of in-depth information in the help menu of KnowledgeMiner Tutorial in the Help menu. However, if you're a regular person, you may want to learn about it the easy way. And that is what this page is about.

It is a simple deal. You provide some raw data, we'll show you how to get the most out of it. Using your data, KnowledgeMiner has the remarkable capability to make a reliable model allowing predictions. And here is how.

Part One: Getting started

Before KnowledgeMiner can be of use to you, it has to learn. However, there is no need for you to teach KnowledgeMiner anything. Using the data, it will teach itself, deducing relationships a mathematician would have a hard time finding out. What you need is only

  • Source data
  • Target data.

Using this, KnowledgeMiner will find hidden relationships all by itself, allowing it to make predictions using new source data you provide.

KnowledgeMiner can be used in a wide scope of fields. Some examples are given here, but actually the scope is so wide, that your area of interest is more likely than not not covered by these examples.

  • If you're interested in stocks, KnowledgeMiner can help you to predict the rise and fall of shares.
  • If you're interested in medicine, KnowledgeMiner may help you to predict the life-expectancy of cancer patients.
  • If you are interested in chemistry, KnowledgeMiner may help you to predict the properties of new molecules.

Actually, just about any field that involves numbers and unknown relationships is where KnowledgeMiner can be applied. An example of how to work with KnowledgeMiner is given below. We really hope it is outside your area of interest and expertise. That would be the best demonstration that it is not necessary to have expertise in the field to make valid predictions. From the example you will appreciate how to apply what you've learned with your own data.  

O.K., let's get on with it. Remember, you do not have to know anything about the application area, here physics and chemistry. Below you see a table. Each row contains information on a certain hydrocarbon (gasoline is a mixture containing a lot of such molecules).

The objective is to predict the boiling point of a hydrocarbon using nothing more that two series of data: 1) the number of carbon atoms of a hydrocarbon; and 2) the molecular weight of that hydrocarbon. To teach KnowledgeMiner, a third set of data is necessary: a series of boiling points for these hydrocarbons. Thus, the columns with the number of carbon atoms and the molecular weights represent the source data; the column with boiling points the target data.

Predicting the boiling point of hydrocarbons

name of hydrocarbon

No. of carbon atoms

Boiling point

Molecular weight

Actual boiling points

Methane

1.0000000000

-164.0000000

16.039999999

Ethane

2.0000000000

-88.60000000

30.070000000

Butane

4.0000000000

-0.500000000

58.119999999

Hexane

6.0000000000

69.000000000

86.180000000

Heptane

7.0000000000

98.400000000

100.20999999

Nonane

9.0000000000

150.80000000

128.25999999

Decane

10.000000000

174.10000000

142.28999999

Dodecane

12.000000000

216.30000000

170.34000000

Octane

8.0000000000

114.23000000

125.70000000

Pentane

5.0000000000

72.150000000

36.100000000

Propane

3.0000000000

44.110000000

-42.10000000

In general terms: for any project you will need two (or more) columns of source data and one column with target data.

Without quitting Netscape Navigator (you probably don't want to miss this page!), open KnowledgeMiner. Click the close box of the window that opens automatically and open the folder Chemistry/Ecology (it is in the Examples folder). Open the file BoilingPoint. There you see all the data from the table shown above.

Now take the following steps.

Firstly, you have to tell KnowledgeMiner which is the column with the target data. To that end, click the cell labeled 'Boiling point'. Please note that in the top row (listing variables as X1, X2, X3 ...) the heading of this column now reads Y. That is the indication that this column contains the target data used for learning.

Your screen should look as depicted below, although the highlight color used to select the cell may differ from the one shown here. (You can change the highlight color of the selection from the control panel Appearance).

 

Secondly, you have to tell KnowledgeMiner which columns contain the source data. That is, which columns you want KnowledgeMiner to use to build the model with. To that end, press the Command key (next to the spacebar, the one with the clover leaf and/or Apple symbol on it) and click the cells labeled 'No. of carbon atoms' and 'Molecular weight'. As the minimum number of columns you have to select is 2, it may seem to be a bit silly in this particular case, but do it anyway. Now your screen should look like this.

 

Thirdly, you have to tell KnowledgeMiner to start building the model. Choose Create Input-Output-Model from the Modeling menu. Now you will see an message box named: Input-Output-Model: Settings.

 

On the left you can verify that the Output variable is the boiling point, and that the Input variables are the number of carbon atoms and the molecular weight, respectively.

We are going to use 8 data sets (number of rows) as source data. In the box Input data, enter 8 as the data length. (Note: As we are not trying to do time-dependant models, the maximum time lag is 0. We will get into time-dependant modeling later).

You can also choose for a linear model or a nonlinear model. Do not worry about making a wrong choice. If you have no idea, just choose nonlinear permissible. If a linear model were to give the best results and you unknowingly opt for nonlinear models, there is really no problem. In that case, KnowledgeMiner will end up with the best linear model anyway. So it is best to choose nonlinear permissible, although it may take your Mac a little longer to deduce the optimum model.

Finally, to build the model, click Modeling (or hit the return key). KnowledgeMiner starts building the model, and as soon as it is ready, it presents you a graph showing both the original (target) data and the approximated data using the model. Thus, the more the red and blue lines overlap, the better the model.

If you experiment between using the linear and nonlinear model, you will notice that the nonlinear model gives better results here.

Part Two: Making predictions.

There are two ways to make predictions. (You must have followed the three steps described in Part One.)

I) Using a spreadsheet program.

The first way is to use a spreadsheet program. Unlike a neural network program, KnowledgeMiner tells you what the relation is, it discovered. Just choose Model Equation from the Window menu. You can use the equation displayed there to enter it into a spreadsheet program. (Note: In a formula, e is short for power of ten. For example. 1e3 equals 1000. If your formula reads: + 5.74e+1X1, it means 57.4 * X1).

II) KnowledgeMiner

There is no need for a spreadsheet program, however. KnowledgeMiner can do it for you too. Here's how.

Choose What-If-Prediction from the Modeling menu. You are presented with a message box named: What-If-Prediction. Enter the number of rows for which you want your prediction. Here there are three rows for which you want to predict the boiling points. So enter 3 as the Forecast Horizon. You probably want the data to be put into the respective cells, so we check the second check box.

The calculated datapoints are depicted in the graph window. If you put the data window in front, you will see the predicted data in red. For comparison, the actual data have been shown in the column to the right.

If you want the actual data in the graph window as well, you will have to put them below the data to be predicted. Before choosing What-If-Prediction, you will choose Original Data Begins In This Row from the Table menu by clicking the first row as shown below.

 You continue with What-If-Prediction as described above, and you will get to see both the predicted and the data for comparison in a single graph.

more to come...

Thanks to Bert Altenburg